Correlations and Omori law in Spamming
نویسندگان
چکیده
The most costly and annoying characteristic of the e-mail communication system is the large number of unsolicited commercial e-mails, known as spams, that are continuously received. Via the investigation of the statistical properties of the spam delivering intertimes, we show that spams delivered to a given recipient are time correlated: if the intertime between two consecutive spams is small (large), then the next spam will most probably arrive after a small (large) intertime. Spam temporal correlations are reproduced by a numerical model based on the random superposition of spam sequences, each one described by the Omori law. This and other experimental findings suggest that statistical approaches may be used to infer how spammers operate. Introduction. – Quoting from Ref. [1], a press release of the European Union: “The proliferation of unsolicited commercial e-mail, or ‘spam’, has reached a point where it creates a major problem for the development of e-commerce and the Information Society. Businesses and individuals spend an increasing amount of time and money simply to clean up e-mailboxes. The loss in productivity for EU businesses has been estimated at 2.5 billion e for 2002. [. . .] Spam has the potential of destroying some of the major benefits brought about by services such as email and SMS.” Spams, defined as undesired commercial e-mails, are estimated to be 70 − 80% of all e-mails [2], as everybody has probably noticed when opening his e-mail box. Such a large number is explained by considering that the daily earning of a spammer is proportional to the number of spams sent. To reduce the nuisance caused by spams, an enormous effort has been devoted to design efficient spam filters (see [3] and references therein), able to quickly discriminate between a spam and a legitimate e-mail. Much less effort has been devoted to the problem of understanding how spammers operate, which is the crucial information required to fight spammers at the source. In this paper, we present a statistical analysis of the spamming process, which may help unveil how spammers operate. Dataset. – Our analysis has been made possible by modern antispam filters, able to discriminate with good accuracy between legitimate e-mails and spams. These filters can be configured in such a way that spams are not erased, but collected in an appropriate folder: we call this folder the junk folder. We have considered four junk folders, J1, J2, J3, J4, belonging to four academic e-mail accounts of our university (domain ‘na.infn.it’). The folders are created by the antispam filter “Sophos”, and contain respectively 16 · 10, 27 · 10, 21 · 10, and 7 · 10 spams. For comparison, we have also considered one standard inbox folder, I, containing 4 · 10 legitimate e-mails. The popularity of the four accounts we have considered among spammers varies, as the mean intertime between two consecutive spams is, respectively, 300s, 700s, 1100s and 870s seconds. For each e-mail, we have determined the time of arrival and the geographical location of the sender. The delivering time of an e-mail ti is registered by the incoming mail server. In order to obtain an estimate of the error on ti, we set-up a script to send at a regular interval, tdelay, e-mails from an account Bob (based in the USA) to a different account, Alice, (based in Italy). The intertime between two consecutive e-mails delivered to Alice is not constant and equal to tdelay, but fluctuates. The typical size of these fluctuations (which may depend on the internet routing) is 10s. This value is our estimate for the error on the delivering times. The geographical location of the sender is determined from the IP address of the sender [5], which is recorded in the envelop which complements any e-mail.
منابع مشابه
Relation between volatility correlations in financial markets and Omori processes occurring on all scales.
We analyze the memory in volatility by studying volatility return intervals, defined as the time between two consecutive fluctuations larger than a given threshold, in time periods following stock market crashes. Such an aftercrash period is characterized by the Omori law, which describes the decay in the rate of aftershocks of a given size with time t by a power law with exponent close to 1. A...
متن کاملOmori Law for Sliding of Blocks on Inclined Rough Surfaces ⋆
Long sequences of slidings of solid blocks on an inclined rough surface submitted to small controlled perturbations are examined and scaling relations are found for the time distribution of slidings between pairs of large events as well as after and before the largest events. These scaling laws are similar to the Omori law in seismology but the scaling exponents observed are different. Log-peri...
متن کاملMainshocks are Aftershocks of Conditional Foreshocks: How do Foreshock Statistical Properties Emerge from Aftershock Laws
The inverse Omori law for foreshocks discovered in the 1970s states that the rate of earthquakes prior to a mainshock increases on average as a power law ∝ 1/(tc − t) p ′ of the time to the mainshock occurring at tc. Here, we show that this law results from the direct Omori law for aftershocks describing the power law decay ∼ 1/(t − tc) p of seismicity after an earthquake, provided that any ear...
متن کاملSub-critical and Super-critical Regimes in Epidemic Models of Earthquake Aftershocks
We present an analytical solution and numerical tests of the epidemic-type aftershock (ETAS) model for aftershocks, which describes foreshocks, aftershocks and mainshocks on the same footing. In this model, each earthquake of magnitude m triggers aftershocks with a rate proportional to 10. The occurrence rate of aftershocks triggered by a single mainshock decreases with the time from the mainsh...
متن کامل